239 research outputs found
Frobenius norm regularization for the multivariate von Misses distribution
Penalizing the model complexity is necessary to avoid overfittingwhen the number of data samples
is low with respect to the number of model parameters. In this paper, we introduce a penalization
term that places an independent prior distribution for each parameter of the multivariate von Mises
distribution.We also propose a circular distance that can be used to estimate the Kullback–Leibler
divergence between any two circular distributions as goodness-of-fit measure. We compare the
resulting regularized von Mises models on synthetic data and real neuroanatomical data to show
that the distribution fitted using the penalized estimator generally achieves better results than
nonpenalized multivariate von Mises estimator
Towards Gaussian Bayesian network fusion
Data sets are growing in complexity thanks to the increasing
facilities we have nowadays to both generate and store data. This poses
many challenges to machine learning that are leading to the proposal of
new methods and paradigms, in order to be able to deal with what is
nowadays referred to as Big Data. In this paper we propose a method
for the aggregation of different Bayesian network structures that have
been learned from separate data sets, as a first step towards mining data
sets that need to be partitioned in an horizontal way, i.e. with respect
to the instances, in order to be processed. Considerations that should be
taken into account when dealing with this situation are discussed. Scalable
learning of Bayesian networks is slowly emerging, and our method
constitutes one of the first insights into Gaussian Bayesian network aggregation
from different sources. Tested on synthetic data it obtains good
results that surpass those from individual learning. Future research will
be focused on expanding the method and testing more diverse data sets
Multi-facet determination for clustering with Bayesian networks
Real world applications of sectors like industry, healthcare or finance usually generate data of
high complexity that can be interpreted from different viewpoints. When clustering this type of
data, a single set of clusters may not suffice, hence the necessity of methods that generate multiple
clusterings that represent different perspectives. In this paper, we present a novel multi-partition
clustering method that returns several interesting and non-redundant solutions, where each of them
is a data partition with an associated facet of data. Each of these facets represents a subset of the
original attributes that is selected using our information-theoretic criterion UMRMR. Our approach
is based on an optimization procedure that takes advantage of the Bayesian network factorization
to provide high quality solutions in a fraction of the time
Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of
mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification
problems where each input instance has to be assigned to a single output class variable. The problem of mining
multi-dimensional data streams, which includes multiple output class variables, is largely unexplored and only few streaming
multi-dimensional approaches have been recently introduced. In this paper, we propose a novel adaptive method, named
Locally Adaptive-MB-MBC (LA-MB-MBC), for mining streaming multi-dimensional data. To this end, we make use of
multi-dimensional Bayesian network classifiers (MBCs) as models. Basically, LA-MB-MBC monitors the concept drift over time
using the average log-likelihood score and the Page-Hinkley test. Then, if a concept drift is detected, LA-MB-MBC adapts the
current MBC network locally around each changed node. An experimental study carried out using synthetic multi-dimensional
data streams shows the merits of the proposed method in terms of concept drift detection as well as classification performance
Learning tractable multidimensional Bayesian network classifiers
Multidimensional classification has become one of the most relevant topics in view of the many
domains that require a vector of class values to be assigned to a vector of given features. The
popularity of multidimensional Bayesian network classifiers has increased in the last few years
due to their expressive power and the existence of methods for learning different families of these
models. The problem with this approach is that the computational cost of using the learned models
is usually high, especially if there are a lot of class variables. Class-bridge decomposability means
that the multidimensional classification problem can be divided into multiple subproblems for these
models. In this paper, we prove that class-bridge decomposability can also be used to guarantee
the tractability of the models. We also propose a strategy for efficiently bounding their inference
complexity, providing a simple learning method with an order-based search that obtains tractable
multidimensional Bayesian network classifiers. Experimental results show that our approach is
competitive with other methods in the state of the art and ensures the tractability of the learned
models
Data publications correlate with citation impact
Neuroscience and molecular biology have been generating large atasets over the past years that are reshaping how research is being conducted.In their wake, open data sharing has been singled out as a major challenge for the future of research. We conducted a comparative study of citations of data publications in both fields, showing that the average publication tagged with a data-related term by the NCBI MeSH(MedicalSubjectHeadings) curators achieves a significantly larger citation impact than the average in either field. We introduce a new metric, the data article citation index(DAC-index), to identify the most prolific authors among those data-related publications.The study is fully reproducible from an executable Rmd(RMarkdown)script to gether with all the citation datasets. We hope these results can encourage authors to more openly publish their data
Anomaly detection with a spatio-temporal tracking of the laser spot
Anomaly detection is an important problem with many applications in
industry. This paper introduces a new methodology for detecting anomalies in a
real laser heating surface process recorded with a high-speed thermal camera (1000
fps, 32×32 pixels). The system is trained with non-anomalous data only (32 videos
with 21500 frames). The proposed method is built upon kernel density estimation
and is capable of detecting anomalies in time-series data. The classification should
be completed in-process, that is, within the cycle time of the workpiece
Decision functions for chain classifiers based on Bayesian networks for multi-label classification
Multi-label classification problems require each instance to be assigned a subset of a
defined set of labels. This problem is equivalent to finding a multi-valued decision function
that predicts a vector of binary classes. In this paper we study the decision boundaries of
two widely used approaches for building multi-label classifiers, when Bayesian networkaugmented
naive Bayes classifiers are used as base models: Binary relevance method
and chain classifiers. In particular extending previous single-label results to multi-label
chain classifiers, we find polynomial expressions for the multi-valued decision functions
associated with these methods. We prove upper boundings on the expressive power of
both methods and we prove that chain classifiers provide a more expressive model than
the binary relevance method
Dynamic Bayesian network-based anomaly detection for in-process visual inspection of laser surface heat treatment
We present the application of a cyber-physical system for inprocess
quality control based on the visual inspection of a laser surface
heat treatment process. To do this, we propose a classification framework
that detects anomalies in recorded video sequences that have been preprocessed
using a clustering-based method for feature subset selection.
One peculiarity of the classification task is that there are no examples
with errors, since major irregularities seldom occur in efficient industrial
processes. Additionally, the parts to be processed are expensive so the
sample size is small. The proposed framework uses anomaly detection,
cross-validation and sampling techniques to deal with these issues. Regarding
anomaly detection, dynamic Bayesian networks (DBNs) are used
to represent the temporal characteristics of the normal process. Experiments
are conducted with two diferent types of DBN structure learning
algorithms, and classification performance is assessed on both anomalyfree
examples and sequences with anomalies simulated by experts
Directional naive Bayes classifiers
Directional data are ubiquitous in science.
These data have some special properties that rule out the
use of classical statistics. Therefore, different distributions
and statistics, such as the univariate von Mises and the
multivariate von Mises–Fisher distributions, should be
used to deal with this kind of information. We extend the
naive Bayes classifier to the case where the conditional
probability distributions of the predictive variables follow
either of these distributions. We consider the simple scenario,
where only directional predictive variables are used,
and the hybrid case, where discrete, Gaussian and directional
distributions are mixed. The classifier decision
functions and their decision surfaces are studied at length.
Artificial examples are used to illustrate the behavior of the
classifiers. The proposed classifiers are then evaluated over
eight datasets, showing competitive performances against
other naive Bayes classifiers that use Gaussian distributions
or discretization to manage directional data
- …